Disputation
16 April 2024
University of Mannheim
Which methods can we use to classify data from open-ended survey questions?
Can we leverage these methods to make empirical contributions to substantial questions?
Motivation:
1️⃣ Increase in methods to collect natural language (e.g., smartphone surveys with voice technologies) requires the evaluation of available methods.
2️⃣ Special structure of open-ended survey answers (e.g., shortness, lack of context) requires the testing of ML methods for the survey context, e.g., word embeddings.
Table 1. Overview of methods for classifying open-ended survey responses. Source: Own depiction.
| Study 1 | Study 2 | Study 3 |
|---|---|---|
| How valid are trust survey measures? New insights from open-ended probing data and supervised machine learning | Open-ended survey questions: A comparison of information content in text and audio response formats | Asking Why: Is there an Affective Component of Political Trust Ratings in Surveys? |
Landesvatter, C., & Bauer, P. C. (2024). How Valid Are Trust Survey Measures? New Insights From Open-Ended Probing Data and Supervised Machine Learning. Sociological Methods & Research, 0(0). https://doi.org/10.1177/00491241241234871
Background: ongoing debates about which type of trust survey researchers are measuring with traditional survey items (i.e., equivalence debate cf. Bauer & Freitag 20181)
Research Question: How valid are traditional trust survey measures?
Questionnaire Design: 5 open-ended questions per respondent, block-randomized order
Data: U.S. non-probability sample; \(n\)=1,500 with 7,497 open answers
Supervised classification approach:
| ID | Measure | Trust | Probing Answer | Associations (known others) | Associations (sentiment) |
| 123 | Most people | 0.33 | I was thinking of people I don’t know personally. | 0 (No) | 0 (neutral/positive) |
| 3139 | Most people | 0.17 | Tourists that come to our little village. I tend to be very wary of them. | 0 (No) | 1 (negative) |
| 2980 | Stranger | 0 | No one in particular, but I don’t think I could trust anyone ever again. | 0 (No) | 1 (negative) |
| 4286 | Watching a loved one | 0 | A former neighbor of mine who was a single father with a son close to my son’s age. | 1 (Yes) | 0 (neutral/positive) |
Landesvatter, C., & Bauer, P. C. (February 2024). Open-ended survey questions: A comparison of information content in text and audio response formats. Working Paper submitted to Public Opinion Quarterly.
Background: requests for spoken answers are assumed to trigger an open narration with more intuitive and spontaneous answers (e.g., Gavras et al. 20221)
Research Question: Are there differences in information content between responses given in voice and text formats?
Experimental Design: random assignment into either the text or voice condition
Operationalization of information content in open answers via application of measures from information theory and machine learning
Questionnaire Design: 9 open-ended questions per respondent, block-randomized order
Data: U.S. non-probability sample; \(n\)=1,461 with \(n_{text}\)=800 and \(n_{audio}\)=661
Landesvatter, C., & Bauer, P. C. (March 2024). Asking Why: Is there an Affective Component of Political Trust Ratings in Surveys?. Working Paper submitted to American Political Science Review.
Background: conventional notion stating that trust originates from informed, rational, and consequential judgments is challenged by the idea of an “affect-based” form of (political) trust (e.g., Theiss-Morse and Barton 20171)
Research Question: Are individual trust judgments in surveys driven by affective rationales?
Questionnaire Design: voice condition only
Large Language models (LLMs) facilitate the accessibility and implementation of semi-automated methods.
traditional semi-automated methods, e.g. supervised ML, require sufficient and high-quality training data (i.e., labeled examples)
surveys often don’t provide thousands of documents
LLMs allow less resource-intensive and domain-specific finetuning and remove the need to build complex systems from scratch
E.g., Study 1: Random Forest with 1,500 labeled examples versus BERT
Increasing number of possibilities to reduce manual input to a minimum.
The final decision for one of the approaches depends on:
difficulty of the given task (e.g., general versus specific codes)
size of the available dataset (e.g., n, splits by experimental conditions)
structure of the open answers (e.g., length, amount of context → this depends on the question design)
the amount and state of previous research (e.g., available code schemes)
desired accuracy and desired transparency
available resources (e.g., human power, computational power (GPU), time resources)
Landesvatter: Methods for the Classification of Data from Open-Ended Questions in Surveys